Authenticating Drivers

ABSTRACT

One or more devices in a data analysis computing system may be configured to receive and analyze movement data and determine driving trips based on the received data. The driving trips may be used along with the movement data to authenticate drivers based on a determined driver profile.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S. patentapplication Ser. No. 16/255,155, now U.S. Pat. No. 10,938,825, filedJan. 23, 2019, and entitled “Authenticating Drivers,” which is acontinuation of U.S. patent application Ser. No. 15/458,735, now U.S.Pat. No. 10,250,611, filed Mar. 14, 2017, and entitled “AuthenticatingDrivers,” the content of each of which is incorporated herein byreference in its entirety.

TECHNICAL FIELD

Various aspects of the disclosure generally relate to systems andmethods of collecting data from a computing device to determine driverauthentication. Specifically, various aspects relate to systems andmethods of collecting and analyzing global positioning (GPS) data andmovement data to authenticate drivers.

BACKGROUND

The ability to collect and analyze data to determine who is driving avehicle has many valuable applications, for example, relating to vehicleand driver insurance, vehicle financing, product safety and marketing,government and law enforcement, and various other applications in otherindustries. The goal of driver detection, or driver fingerprinting, isto determine whether a user recording a car trip with a computing deviceis a driver or a passenger of the vehicle. If driver profiles are knownor have been determined for all potential drivers of a vehicle, then thesolution becomes one of driver identification. If all potential driversare known the solution becomes one of a forced task choice thatdetermines which driver profile is the closest match in the database.

In contrast, solving the problem of driver authentication involvesdetermining the driver from a pool of drivers that may be largelyunknown. Solving such a problem is needed and would have many valuableapplications. Further complications that need to be overcome in thecontext of driver authentication include making such determinationsbased on unsupervised, i.e. unlabeled data. Additionally, there is aneed to determine driver authentication based on a method which isagnostic to road, traffic, and weather conditions. Finally, a needexists for a method and system to determine driver authentication basedon collected real-time data. Such real-time data may be collected innon-uniform/varying road, traffic, and weather conditions.

SUMMARY

The following presents a simplified summary in order to provide a basicunderstanding of some aspects of the disclosure. The summary is not anextensive overview of the disclosure. It is neither intended to identifykey or critical elements of the disclosure nor to delineate the scope ofthe disclosure. The following summary merely presents some concepts ofthe disclosure in a simplified form as a prelude to the descriptionbelow.

Aspects of the disclosure relate to systems, apparatuses,computer-implemented methods, and computer-readable media for receivingand analyzing GPS and movement data to identify driving patterns anddrivers based on the data. In some cases, the movement data maycorrespond to acceleration data, speed data, or other movement datacollected by various movement sensors in one or more mobile devices,such as smartphones, tablet computers, and on-board vehicle systems.

According to some aspects of the disclosure, data such as GPS data andmovement data may be received and used to determine whether a user is adriver or passenger of a vehicle. A small amount of labeled trips may beused to generate a user driver profile, e.g. by sampling routine tripsthat have a high likelihood of the user driving or by asking the user tolabel a small amount of trips. Once this small subset of labeled tripsis obtained, a driving pattern may be determined for the driver. Thedetermined driving pattern may be used to generate a driving profile forthe driver. In an embodiment, the generated driving profile from a newtest trip may be compared to previously generated driver profiles and astored background driver profile in order to authenticate the driveridentity based on the comparison.

According to some aspects of the disclosure, driving patterns may bedetermined based on statistical analyses of the GPS and movement data.Trip attributes such as number of stopping points during a trip, numberof turns, acceleration rate, deceleration rate, time of day etc. may beused to determine driving patterns. Other features and advantages of thedisclosure will be apparent from the additional description providedherein.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the present invention and theadvantages thereof may be acquired by referring to the followingdescription in consideration of the accompanying drawings, in which likereference numbers indicate like features, and wherein:

FIG. 1 illustrates a network environment and computer systems that maybe used to implement aspects of the disclosure.

FIG. 2 is a diagram illustrating the components of an example movementdata and driving data analysis system, according to one or more aspectsof the disclosure.

FIG. 3 is a diagram illustrating routine level features according to oneor more aspects of the disclosure.

FIG. 4 is a diagram illustrating control level features according to oneor more aspects of the disclosure.

FIG. 5 is a flow diagram illustrating an example process of determininga driver based on GPS and movement data according to one or more aspectsof the disclosure.

DETAILED DESCRIPTION

In the following description of the various embodiments, reference ismade to the accompanying drawings, which form a part hereof, and inwhich is shown by way of illustration, various embodiments of thedisclosure that may be practiced. It is to be understood that otherembodiments may be utilized.

As will be appreciated by one of skill in the art upon reading thefollowing disclosure, various aspects described herein may be embodiedas a method, a computer system, or a computer program product.Accordingly, those aspects may take the form of an entirely hardwareembodiment, an entirely software embodiment or an embodiment combiningsoftware and hardware aspects. Furthermore, such aspects may take theform of a computer program product stored by one or morecomputer-readable storage media having computer-readable program code,or instructions, embodied in or on the storage media. Any suitablecomputer readable storage media may be utilized, including hard disks,CD-ROMs, optical storage devices, magnetic storage devices, and/or anycombination thereof. In addition, various signals representing data orevents as described herein may be transferred between a source and adestination in the form of electromagnetic waves traveling throughsignal-conducting media such as metal wires, optical fibers, and/orwireless transmission media (e.g., air and/or space).

FIG. 1 illustrates a block diagram of a computing device (or system) 101in a computer system 100 that may be used according to one or moreillustrative embodiments of the disclosure. The device 101 may have aprocessor 103 for controlling overall operation of the device 101 andits associated components, including RAM 105, ROM 107, input/outputmodule 109, and memory 115. The computing device 101, along with one ormore additional devices (e.g., terminals 141 and 151, security andintegration hardware 160) may correspond to any of multiple systems ordevices, such as a mobile computing device or a driving data analysisserver, configured as described herein for receiving and analyzingmovement data from mobile device movement sensors, and identifyingdriving patterns and drivers associated with the movement data.

Input/Output (I/O) 109 may include a microphone, keypad, touch screen,and/or stylus through which a user of the computing device 101 mayprovide input, and may also include one or more of a speaker forproviding audio output and a video display device for providing textual,audiovisual and/or graphical output. Software may be stored withinmemory 115 and/or storage to provide instructions to processor 103 forenabling device 101 to perform various actions. For example, memory 115may store software used by the device 101, such as an operating system117, application programs 119, and an associated internal database 121.The various hardware memory units in memory 115 may include volatile andnonvolatile, removable and non-removable media implemented in any methodor technology for storage of information such as computer readableinstructions, data structures, program modules or other data. Certaindevices/systems within a movement data/driving data analysis system mayhave minimum hardware requirements in order to support sufficientstorage capacity, analysis capacity, network communication, etc. Forinstance, in some embodiments, one or more nonvolatile hardware memoryunits having a minimum size (e.g., at least 1 gigabyte (GB), 2 GB, 5 GB,etc.), and/or one or more volatile hardware memory units having aminimum size (e.g., 256 megabytes (MB), 512 MB, 1 GB, etc.) may be usedin a device 101 (e.g., an insurance provider server 101, a movementdata/driving data analysis device 101, etc.), in order to store and/orexecute a movement data analysis software application, receive andprocess sufficient amounts of movement data from various sensors at adetermined data sampling rate, and analyze movement data to identifydriving patterns and determine associated drivers, etc. Memory 115 alsomay include one or more physical persistent memory devices and/or one ormore non-persistent memory devices. Memory 115 may include, but is notlimited to, random access memory (RAM) 105, read only memory (ROM) 107,electronically erasable programmable read only memory (EEPROM), flashmemory or other memory technology, CD-ROM, digital versatile disks (DVD)or other optical disk storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or any othermedium that can be used to store the desired information and that can beaccessed by processor 103.

Processor 103 may include a single central processing unit (CPU), whichmay be a single-core or multi-core processor (e.g., dual-core,quad-core, etc.), or may include multiple CPUs. Processor(s) 103 mayhave various bit sizes (e.g., 16-bit, 32-bit, 64-bit, 96-bit, 128-bit,etc.) and various processor speeds (ranging from 100 MHz to 5 Ghz orfaster). Processor(s) 103 and its associated components may allow thesystem 101 to execute a series of computer-readable instructions, forexample, to execute a movement data analysis software application thatreceives and stores data from mobile device movement sensors, analyzesthe movement data, and determines driving patterns and associateddrivers based on the movement data.

The computing device (e.g., a mobile computing device, a driving dataanalysis server, etc.) may operate in a networked environment 100supporting connections to one or more remote computers, such asterminals 141 and 151. The terminals 141 and 151 may be personalcomputers, servers (e.g., web servers, database servers), or mobilecommunication devices (e.g., mobile phones, portable computing devices,on-board vehicle computing systems, and the like), and may include someor all of the elements described above with respect to the computingdevice 101. The network connections depicted in FIG. 1 include a localarea network (LAN) 125 and a wide area network (WAN) 129, and a wirelesstelecommunications network 133, but may also include other networks.When used in a LAN networking environment, the computing device 101 maybe connected to the LAN 125 through a network interface or adapter 123.When used in a WAN networking environment, the device 101 may include amodem 127 or other means for establishing communications over the WAN129, such as network 131 (e.g., the Internet). When used in a wirelesstelecommunications network 133, the device 101 may include one or moretransceivers, digital signal processors, and additional circuitry andsoftware for communicating with wireless computing devices 141 (e.g.,mobile phones, portable customer computing devices, on-board vehiclecomputing systems, etc.) via one or more network devices 135 (e.g., basetransceiver stations) in the wireless network 133.

Also illustrated in FIG. 1 is a security and integration layer 160,through which communications may be sent and managed between the device101 (e.g., a user's mobile device, a driving data analysis system, etc.)and the remote devices (141 and 151) and remote networks (125, 129, and133). The security and integration layer 160 may comprise one or moreseparate computing devices, such as web servers, authentication servers,and/or various networking components (e.g., firewalls, routers,gateways, load balancers, etc.), having some or all of the elementsdescribed above with respect to the computing device 101. As an example,a security and integration layer 160 of a driving data analysis serveroperated by an insurance provider, financial institution, governmentalentity, or other organization, may comprise a set of web applicationservers configured to use secure protocols and to insulate the server101 from external devices 141 and 151. In some cases, the security andintegration layer 160 may correspond to a set of dedicated hardwareand/or software operating at the same physical location and under thecontrol of same entities as driving data analysis server 101. Forexample, layer 160 may correspond to one or more dedicated web serversand network hardware in an organizational datacenter or in a cloudinfrastructure supporting a cloud-based driving data analysis system. Inother examples, the security and integration layer 160 may correspond toseparate hardware and software components which may be operated at aseparate physical location and/or by a separate entity.

As discussed below, the data transferred to and from various devices inthe computing system 100 may include secure and sensitive data, such asmovement data, driving pattern data, and/or driving behavior dataassociated with a driver or vehicle. Therefore, it may be desirable toprotect transmissions of such data by using secure network protocols andencryption, and also to protect the integrity of the data when stored onin a database or other storage in a mobile device, driving data analysisserver, or other computing devices in the system 100, by using thesecurity and integration layer 160 to authenticate users and restrictaccess to unknown or unauthorized users. In various implementations,security and integration layer 160 may provide, for example, afile-based integration scheme or a service-based integration scheme fortransmitting data between the various devices in a system 100. Data maybe transmitted through the security and integration layer 160, usingvarious network communication protocols. Secure data transmissionprotocols and/or encryption may be used in file transfers to protect tointegrity of the driving data, for example, File Transfer Protocol(FTP), Secure File Transfer Protocol (SFTP), and/or Pretty Good Privacy(PGP) encryption. In other examples, one or more web services may beimplemented within the various devices 101 in the system 100 and/or thesecurity and integration layer 160. The web services may be accessed byauthorized external devices and users to support input, extraction, andmanipulation of the data (e.g., movement data, location data, drivingbehavior data, etc.) between the various devices 101 in the system 100.Web services built to support system 100 may be cross-domain and/orcross-platform, and may be built for enterprise use. Such web servicesmay be developed in accordance with various web service standards, suchas the Web Service Interoperability (WS-I) guidelines. In some examples,a movement data and/or driving data web service may be implemented inthe security and integration layer 160 using the Secure Sockets Layer(SSL) or Transport Layer Security (TLS) protocol to provide secureconnections between servers 101 and various clients 141 and 151 (e.g.,mobile devices, data analysis servers, etc.). SSL or TLS may use HTTP orHTTPS to provide authentication and confidentiality. In other examples,such web services may be implemented using the WS-Security standard,which provides for secure SOAP messages using XML encryption. In stillother examples, the security and integration layer 160 may includespecialized hardware for providing secure web services. For example,secure network appliances in the security and integration layer 160 mayinclude built-in features such as hardware-accelerated SSL and HTTPS,WS-Security, and firewalls. Such specialized hardware may be installedand configured in the security and integration layer 160 in front of theweb servers, so that any external devices may communicate directly withthe specialized hardware.

Although not shown in FIG. 1, various elements within memory 115 orother components in system 100, may include one or more caches, forexample, CPU caches used by the processing unit 103, page caches used bythe operating system 117, disk caches of a hard drive, and/or databasecaches used to cache content from database 121. For embodimentsincluding a CPU cache, the CPU cache may be used by one or moreprocessors in the processing unit 103 to reduce memory latency andaccess time. In such examples, a processor 103 may retrieve data from orwrite data to the CPU cache rather than reading/writing to memory 115,which may improve the speed of these operations. In some examples, adatabase cache may be created in which certain data from a database 121(e.g., a movement data database, a driving pattern database, etc.) iscached in a separate smaller database on an application server separatefrom the database server. For instance, in a multi-tiered application, adatabase cache on an application server can reduce data retrieval anddata manipulation time by not needing to communicate over a network witha back-end database server. These types of caches and others may beincluded in various embodiments, and may provide potential advantages incertain implementations of movement data and driving data collection andanalysis systems, such as faster response times and less dependence onnetwork conditions when transmitting/receiving movement data analysissoftware applications (or application updates), movement data, drivingpattern data, etc.

It will be appreciated that the network connections shown areillustrative and other means of establishing a communications linkbetween the computers may be used. The existence of any of variousnetwork protocols such as TCP/IP, Ethernet, FTP, HTTP and the like, andof various wireless communication technologies such as GSM, CDMA, WiFi,and WiMAX, is presumed, and the various computer devices and systemcomponents described herein may be configured to communicate using anyof these network protocols or technologies.

Additionally, one or more application programs 119 may be used by thevarious computing devices 101 within a movement data and/or driving dataanalysis system 100 (e.g., movement data analysis softwareapplications), including computer executable instructions for receivingand storing movement data from mobile device sensors, analyzing themovement data to identify driving patterns and drivers associated withthe movement data, and performing other related functions as describedherein.

FIG. 2 is a diagram of an illustrative movement data/driving dataanalysis system 200. In this example system diagram, a movementdata/driving data analysis server 210 may communicate with a pluralityof different mobile computing devices 220, which may include, forexample, mobile user devices (e.g., smartphones, personal digitalassistants, tablet and laptop computers, etc.), on-board vehiclesystems, and any other mobile computing devices. Each component of amovement data/driving data analysis system 200 may include a computingdevice (or system) having some or all of the structural componentsdescribed above for computing device 101. Additionally, although notshown in FIG. 2, any movement data/driving data analysis system 200described herein may include various non-vehicle roadway infrastructuredevices, such as toll booths, rail road crossings, traffic cameras, androad-side traffic monitoring devices. In various examples, the movementdata/driving data analysis servers 210 and/or mobile computing devices220 may be configured to communicate with such infrastructure devices,which may serve as additional data sources for movement data and/ordriving data. For instance, vehicle speed, acceleration, and the like,may be obtained by road-side traffic monitoring devices and transmittedto one or more mobile computing devices 220 and/or movement data/drivingdata analysis servers 210.

The data analysis server 210 may be, for example, a computer serverhaving some or all of the structural components described above forcomputing device 101. As described below in more detail, in some casesthe data analysis server 210 may be configured to provide movement dataanalysis software applications to various mobile computing devices 220.The data analysis server 210 also may be configured to receive andanalyze movement data (which may or may not correspond to driving data)from mobile computing devices 220, attempt to identify driving patternsbased on the received movement data, and use driving patterns toidentify drivers and other driving characteristics associated with themovement data. Therefore, in some embodiments, the server 210 mayinclude one or more movement data and/or driving data analysis softwareapplications 211, and one or more driving pattern databases 212. Asdescribed below in more detail, the server 210 may distribute a firstsoftware application 211 to mobile devices 220, for example, a movementdata analysis application 211 (which may be stored as application 222 onthe mobile device 220). The movement data analysis application 222 mayoperate on the mobile device 220 to analyze movement data and determinedriving patterns within the movement data. A second software application211 operating on the server 210, may be configured to receive andanalyze the driving pattern data from the movement data analysisapplication 222 on the mobile device 220, and to identify a driver forthe driving trip by determining and matching an observed driving patternto previously-stored driving pattern in the driving pattern database212.

In order to perform the functionality described above, and theadditional functionality discussed in more detail below, the server 210may include one or more processing units (e.g., single-core, dual-core,or quad-core processors, etc.) having a minimum sufficient bit size(e.g., 32-bit, 64-bit, 96-bit, 128-bit, etc.) and minimum requiredprocessor speeds (e.g., 500 MHz, 1 GHz, etc.), and sufficient volatileand nonvolatile memory (e.g., at least 256 MB of RAM, at least 5 GB ofmemory, etc.), in order to store movement data/driving data analysisapplications (e.g., including various different versions, upgrades,etc.), establish communication sessions with and distribute applicationsto various mobile computing devices 220, and receive and analyzemovement data/driving data from the mobile computing devices 220.Additionally, as described below, private and secure data may betransmitted between the data analysis server 210 and various mobilecomputing devices 220, such as private location data, movement data,driving behavior data, and personal driver/customer data, etc.Therefore, in some embodiments, server 210 may include various securityand/or integration components (e.g., web servers, authenticationservers) and/or various network components (e.g., firewalls, routers,gateways, load balancers, etc.). The server 210 also may provide and/orrequire communications over certain secure protocols or encryptiontechniques (e.g., FTP or SFTP, PGP, HTTP or HTTPS, SOAP, XML encryption,etc.), in order to protect the private or secure data transmittedbetween the server 210 and various mobile computing devices 220.

The movement data/driving data analysis system 200 in these examples mayalso include a plurality of mobile computing devices 220. As discussedbelow, in some embodiments, mobile computing devices 220 may receive andexecute a movement data analysis software application 222 from theserver 210 or other application provider (e.g., an application store orthird-party application provider). As part of the execution of themovement data analysis software application 222, or implemented asseparate functionality, mobile computing device 220 may receive andanalyze movement data from movement sensors 223 of the mobile device220, identify driving patterns based on the received movement data, anduse driving patterns to identify drivers associated with the movementdata. Accordingly, in some embodiments, a mobile computing device 220may include one or more processing units having a minimum sufficient bitsize (e.g., 32-bit, 64-bit, etc.) and minimum required processor speeds(e.g., 233 MHz, 500 MHz, etc.), and sufficient volatile and nonvolatilememory (e.g., at least 256 MB of RAM, at least 1 GB of memory, etc.), inorder to store and execute one or more such movement data analysissoftware applications, and to establish communication sessions with adata analysis server 210 and/or various other devices (e.g., on-boardvehicle systems, other mobile devices 220, etc.) to transmit or receivemovement data, driving pattern data, etc. Additionally, mobile computingdevices 220 may receive and transmit private or secure data, such asprivate location data, movement data, and driving behavior data, andpersonal driver/customer data, etc. Therefore, in some embodiments,mobile computing devices 220 may include various network components(e.g., firewalls, routers, gateways, load balancers, etc.), and mayprovide and/or require communications over certain secure protocols orencryption techniques (e.g., FTP or SFTP, PGP, HTTP or HTTPS, SOAP, XMLencryption, etc.), in order to protect the private or secure datatransmitted between the mobile device 220 and other devices.

Mobile device 220, which may be a smartphone, personal digitalassistant, tablet computer, on-board vehicle system, etc., may includesome or all of the elements described above with respect to thecomputing device 101. In this example, mobile device 220 includes anetwork interface component 221, which may include various networkinterface hardware (e.g., LAN interfaces, WAN modems, or wirelesstransceivers, etc.) and software components to enable mobile device 220to communicate with one or more movement data/driving data analysisservers 210, other mobile devices 220, and various other externalcomputing devices (e.g., application stores, third-party driving dataservers, etc.). As shown in FIG. 2, a movement data analysis softwareapplication 222 may be stored in the memory of the mobile device 220.The movement data analysis software application 222 may be received vianetwork interface 221 from server 210 or other application provider(e.g., an application store).

Mobile computing devices 220 may include one or more movement sensors223 configured to detect, generate, and collect movement data when thedevice 220 is moved. Movement sensors 223 may include, for example, GPSsensors, accelerometers, speedometers, compasses, and gyroscopes.Additional movement sensors 223 may include certain sensors that mightnot be specifically designed to detect movement, but nonetheless may beused to detect movement by collecting and analyzing the sensor data overtime, for example, cameras, proximity sensors, and various wirelessnetwork interfaces capable of detect access to different data networks,mobile networks, and other mobile devices (e.g., via Bluetooth).Different mobile devices 220 may include different sets of movementsensors 223. For instance, one smartphone may include only anaccelerometer and a clock to collect and store device acceleration dataand corresponding time data, while another smartphone or vehicleon-board computer may include an accelerometer, clock, speedometer, andcompass (to collect speed and directional data), etc.

The memory of the mobile device 220 also may include one or moredatabases or other storage arrangements 225. Databases 225 may beconfigured to receive and store, for example, movement data collected bythe movement sensors 223 of the mobile device 220, before that data isanalyzed using the movement data analysis software application 222. Insome cases, database 225 also may store the driving pattern data for oneor more users of the mobile device 200. Driving pattern data, discussedin more detail below, may include one or more sets of movement datasamples or calculations that may be used to identify a particular driverassociated with observed driving data. Database 225 may store drivingpattern data for the device owner and/or other devices users (e.g.,family members, friends, and/or frequent users of the device 200). Insome cases, multiple driving patterns may be stored for the same user.For instance, a driver may have different observable driving patternswhen driving different cars (e.g., the family minivan versus theconvertible), driving with different people (e.g., driving alone versusdriving with family members), driving at different times/locations(e.g., driving to work versus on the weekend), driving during differentseasons/conditions (e.g., summer versus winter driving), or driving in acaravan (e.g., leading or following other known drivers). In someexamples, the driving data database 225 may exist within the applicationmemory for the movement data analysis software application 222, and inother examples may be stored separately as persistent data within thedevice memory.

As shown in FIG. 2, in certain examples, a mobile device 220 may be anon-board vehicle system. In these examples, the on-board vehicle system220 may correspond to a telematics device, vehicle computer, and/oron-board diagnostics systems. The on-board vehicle system 220 mayinclude some or all of the elements described above with respect to thecomputing device 101, and may include similar (or the same) componentsto those in other mobile user devices 220 (e.g., smartphones, tabletcomputers, etc.). For on-board vehicle systems 220, movement sensors 223may further include the various vehicle sensors, including hardwareand/or software components configured to receive vehicle driving datacollected by the various vehicle sensors. For example, vehicle sensorsmay detect and store data corresponding to the vehicle's speed,distances driven, rates of acceleration or braking, and specificinstances of sudden acceleration, braking, turning, and swerving.Sensors also may detect and store data received from the vehicle'sinternal systems, such as headlight usage, brake light operation, dooropening and closing, door locking and unlocking, cruise control usage,hazard lights usage, windshield wiper usage, horn usage, turn signalusage, seat belt usage, phone and radio usage within the vehicle,maintenance performed on the vehicle, and other data collected by thevehicle's computer systems. Additional vehicle sensors may detect andstore data relating to the maintenance of the vehicle, such as theengine status, oil level, engine coolant temperature, odometer reading,the level of fuel in the fuel tank, the level of charge in the battery(e.g., for hybrid or electric cars), engine revolutions per minute(RPMs), and/or tire pressure. Certain vehicles also may include camerasand/or proximity sensors capable of recording conditions inside oroutside of the vehicle, as well as sensors configured to collect dataassociated with a driver's movements or the condition of a driver, forexample, sensors that monitor a driver's movements, such as the driver'seye position and/or head position, etc. Additional safety orguidance-assistance features may be included in some vehicles, detectingand storing data such as lane departures, activation of adaptive cruisecontrol, blind spot alerts, etc.

In still other examples, the mobile device 200 may be a user device asdescribed above (e.g., a smartphone, personal digital assistant, ortablet computer, etc.), and also may include a vehicle interfacecomponent to allow the mobile device to establish communication with anon-board vehicle system. For example, either the mobile device 220 or avehicle may be implemented with hardware (e.g., an input port or dockingstation) and/or software (e.g., network interfaces, secure protocols andencryption, etc.), and may be designed and configured to establishcommunication (using a wired or wireless connection) between the mobiledevice 220 and an on-board vehicle system. For example, a smartphone ortablet computer 220, which is often carried by a user, may include anon-board vehicle system interface to detect and/or connect to anon-board vehicle system whenever the user is driving (and/or riding as apassenger) in a vehicle. After a mobile device 220 establishescommunication with an on-board vehicle system, which may be a telematicsdevice, on-board diagnostic system, vehicle navigation device, or othervehicle computer system, the mobile device 220 may receive vehiclesensor data collected by various vehicle sensors. Thus, non-vehiclebased mobile devices 220 (e.g., smartphones or tablet computers) may usevehicle interfaces to receive some or all of the same vehicle sensordata and driving data that is accessible to on-board vehicle systems220, discussed above.

The movement data collected by the movement sensors 223 of the mobiledevice 220, or received from another mobile device 220, may be stored inthe memory of the mobile device 220 and/or transmitted to the server210. This movement data may be analyzed by the mobile device 220 and/orby server 210 (e.g., using a movement data analysis softwareapplication) to determine when the movement data corresponds to adriving pattern, and using driving patterns to determine a driver andother characteristics of a driving trip. For instance, mobile device 220or computing device 101 may be a standalone device capable of performingall of the functions described throughout this disclosure.

In an aspect of the disclosure, time series data recorded during avehicle trip may contain several layers of information, including butnot limited to information about the road, traffic, the vehicle, and thedriver. Several frameworks may be employed for analysis of collectedtime series data. The first framework computes features at trip leveland characterizes a time series by a set of statistical quantities whichsummarize the entire trip. The second framework breaks the time seriesup into small and partially overlapping window frames and computesfeatures for each window frame separately.

In an embodiment, collected time series data may be pre-processed. Forexample, for testing purposes on labeled data (Collectr data), a minimumnumber of driver and non-driver trips may be selected. In an embodiment,a switch may be included to select test settings where negative samplesare either restricted to be car passenger trips or, more generally,non-driver trips, i.e. trips with other modes of transport e.g. train,bus, plane, etc. Such non-driver trips may be included as part of thenegative samples. In an embodiment, tails of the trip recorded after thetrip has ended are cut off and a minimum-length filter for the resultingtrips may be reapplied. Next, zero speed portions of trip time seriesmay be excised. In addition, the trip time series may be resampled atconstant intervals with a variety of interpolation and regressiontechniques, e.g. linear interpolation, spline interpolation orSavitzky-Golay filtering, before ingesting the trip into the pipeline.

In an embodiment of the disclosure, features for driver detection may beclassified into two categories, routine-based and control-based. Theroutine-based category may make use of the fact that people arecreatures of habit. For instance, 98 percent of the time a particulardriver may take the same route every day to work at around the sametime.

In an aspect of the disclosure, a user's driving pattern may be thoughtof as a fingerprint as each driver exhibits various driving tendencies.These tendencies may include but not limited to frequent braking, fastdecelerations or accelerations, typical driving times (day versusnight), distance of trips, number of turns on a driving trip, averagespeed, driving/not driving in various weather conditions, brakingcharacteristics, phone handling patterns, wearing of seat belts, radiooperation, and driving behavior. Additional driving characteristics thatmay be exhibited by a driver may include but is not limited to vehiclerpm, idling time, speed, vertical/horizontal acceleration, start and endtime of trip.

Empirical investigation reveals that driver authentication in asupervised learning setting can often be carried out by looking attrip-level features that capture the routine of a user. For example, thestarting and ending location of trips may provide sufficient informationfor a decision-making process.

Routine features that are computed at trip level may include latitudeand longitude of starting location, latitude and longitude of endinglocation, start time and end time normalized by day or week, duration oftrip, etc. For instance, FIG. 3 illustrating routine level featuresaccording to one or more aspects of the disclosure. In FIG. 3,trip-level features are show in table 302. For each of trips 304, 306,and 308, trip-level features shown in table 302 may be captured and usedto determine a driver of the associated vehicle.

In another aspect of the disclosure, a driver may have frequentidentical driving trip events. For instance, a student “X” may leavehis/her house every Monday through Friday at 8:15 am to arrive at schoolon time at 9:00 am. This student may take the exact same route to andfrom school each day. Each day the vehicle is driven to school, thenumber of right and left turns for each driving trip may be the same.Similarly, the mileage and driving time for each driving trip may alsobe very close within a reasonable tolerance. Such a consistent drivingpattern may be associated with student “X” based on a number of suchidentical driving trips.

In an embodiment, a driving pattern may be generated by determiningstopping points and the total number of turns with the received movementdata. In some cases, a stopping point during a driving trip maycorrespond to a stop sign, stoplight, or other intersection stoppingpoint, yielding or merging in traffic, stop-and-go traffic conditions,etc. Additionally, parking a vehicle for an extended period of timeduring a driving trip, or at the end of a driving trip, may be astopping point. In an embodiment, driving patterns may be based on speeddata, acceleration or braking data, or other movement data occurringduring a driving trip.

The control-based category of features may consist of physical variablesover which the user has direct influence in the short term.Control-based features may include but are not limited to gas pedalpositions, brake pedal positions, steering wheel angle as well asvarious physical parameters closely related to and derived fromquantities such as speed, acceleration, jerk, angular speed, angularacceleration, angular jerk and power per mass ratio, etc. In anembodiment, control-based features may be computed both at a trip leveland at a window frame level.

FIG. 4 is a diagram illustrating control level features according to oneor more aspects of the disclosure. As shown in FIG. 4 control levelfeatures are extracted from the GPS time series by looking at 30-50second frames 402 which scan the time series. Stroboscopic snapshots 404are taken at regular intervals and statistical, dynamic and spectralquantities are extracted (FIG. 4, 406) for each of the physicalparameters in this table.

In an aspect of the disclosure, a computing device such as a mobilephone may collect sensor data regarding a driving trip. The sensor datamay include velocity, acceleration. GPS data, time of day, duration oftrip, distance traveled, etc.

In an aspect of the disclosure, different models for feature extractionmay be applied to the routine based features and the control-basedfeatures. In an embodiment, for routine features a Gradient BoostingMachine (GBM) model may be used. Routine features may be extracted fromthe data and may yield exactly one feature vector sample per trip. In anembodiment, a feature vector for the GBM model may be represented by:

$x_{gbm} = {\begin{bmatrix}{{start}\mspace{14mu} {latitude}} \\{{start}\mspace{14mu} {longitude}} \\{{end}\mspace{14mu} {latitude}} \\{{end}\mspace{14mu} {longitude}} \\{{start}\mspace{14mu} {time}} \\{{end}\mspace{14mu} {time}} \\{duration}\end{bmatrix} = \begin{bmatrix}B_{start} \\L_{start} \\B_{end} \\L_{end} \\t_{start} \\t_{end} \\T\end{bmatrix}}$

In another embodiment, for routine feature classification, a tree-basedmethod model such as a random forest (RF) model may be utilized. Atree-based model may be useful as latitude and longitude coordinates canbe unwieldy for standard normalization techniques. As a result, methodsthat require these features to be normalized to zero mean and unitvariance might suffer from the presence of GPS coordinates. In anembodiment, the output of decision trees is invariant under monotonefeature transformations so that normalization and feature scaling may beunnecessary. Another advantage of decision trees, albeit somewhatdiminished in a random forest setting, is that they are more resilientto the inclusion of irrelevant features. To reduce bias and variance ofa decision tree, ensemble methods may be utilized.

In an aspect of the disclosure, for control-based features a GaussianMixture Model (GMM) may be used. In an embodiment, control-basedfeatures may be extracted by the GMM at a window frame level. In anembodiment, a time series of at least the following physical parametersmay be determined: speed(s), acceleration (a), jerk (j), angular speed(ω), angular acceleration (α), angular jerk (χ) and power per mass (ppm)ratio.

In an embodiment, from each of these physical parameters features may beextracted. These features may be inputted as a time series for each ofthese physical parameters. For instance, inputs may include a timeseries for velocity and a time series for acceleration. In anembodiment, examination of the different time series may be done bytaking snapshots of portions of the time series. Each snapshot mayinclude a small portion of the overall time series from which featuresmay be extracted. In an embodiment, features that are extracted may bestatistical, dynamic, and spectral in nature. For example statisticalfeatures may include the maximum, minimum, mean, standard deviation,skewness, kurtosis, quartiles, autocorrelation and entropy for each ofthe parameters.

The speed (s) and course (c) time series data may be directly obtainedfrom a GPS signal. The course time series data may be used to computethe angular speed (co) time series as follows:

$\omega_{i} = {\frac{\pi}{180{^\circ}}\left\{ \begin{matrix}\frac{c_{i + 1} + \left( {{360{^\circ}} - c_{i}} \right)}{t_{i + 1} - t_{i}} & {{{if}\mspace{14mu} c_{i + 1}} < {90{^\circ}\mspace{14mu} {and}\mspace{14mu} c_{i}} > {270{^\circ}}} \\\frac{c_{i + 1} - c_{i} - {360{^\circ}}}{t_{i + 1} - t_{i}} & {{{{if}\mspace{14mu} c_{i + 1}} > {270{^\circ}\mspace{14mu} {and}\mspace{14mu} c_{i}} < {90{^\circ}}}\mspace{11mu}} \\\frac{c_{i + 1} - c_{i}}{t_{i + 1} - t_{i}} & {otherwise}\end{matrix} \right.}$

Next, in an embodiment, speed and angular speed may be differentiatedonce to obtain the acceleration (a) and angular acceleration (α) timeseries, and twice to obtain the jerk (j) and angular jerk time (X)series.

$\quad\begin{matrix}{a = \frac{d\; s}{d\; t}} & {j = {\frac{d\; a}{d\; t} = \frac{d^{2}s}{d\; t^{2}}}} \\{\alpha = \frac{d\; \omega}{d\; t}} & {x = {\frac{d\; \alpha}{d\; t} = \frac{d^{2}\omega}{d\; t^{2}}}}\end{matrix}$

In an embodiment, the ppm ratio time series may be computed as the timederivative of the speed squared.

${ppm} = {{\frac{d}{d\; t}\left( s^{2} \right)} \propto {\frac{d}{d\; t}\left( \frac{E_{kin}}{m} \right)}}$

The numerical differentiation may be performed with a two-sidedfour-point first-order finite difference approximation with O(h⁴) errorin the intermediate points of a time series and with a one-sidedthree-point first-order finite difference formula with O(h³) error atthe end points of a time series. For a time series with N points one mayobtain:

$f_{i}^{\prime} \approx \left\{ \begin{matrix}\frac{{2f_{i + 3}} - {9f_{i + 2}} + {18f_{i + 1}} - {11f_{i}}}{6\; h} & {{{{if}\mspace{14mu} i} = 0},1} \\\frac{{2f_{i}} - {9f_{i - 1}} + {18f_{i - 2}} - {11f_{i - 3}}}{6\; h} & {{{{if}\mspace{14mu} i} = {N - 1}},{N - 2}} \\\frac{{- f_{i + 2}} + {8f_{i + 1}} - {8\; f_{i - 1}} + f_{i - 2}}{12\; h} & {{{if}\mspace{14mu} i} \in \left\lbrack {2,{N - 3}} \right\rbrack}\end{matrix} \right.$

Before differentiation is applied, the time series may bere-interpolated at constant intervals with a variety of interpolationand regression techniques, e.g. linear interpolation, splineinterpolation or Savitzky-Golay filtering so as to allow the properapplication of the finite difference formulas. The number of points inthe time series may change as a result, especially if there are gaps inthe time series.

The approximate frequency of the time series is 1 Hz as the data beingutilized is GPS data. The window frame width may be selected to bebetween 30-50 seconds with an overlap of ca. one third of the windowwidth between consecutive frames. In an embodiment, empiricalcross-validation studies suggest that the optimal window frame lengthand sampling rate is ca. 3 s and 10 Hz, respectively. This may indicatethat other sensors may be more optimal for use but also may also comewith signal processing challenges with their use.

In an aspect of the disclosure, each window snapshot may yield a featurevector. The entries of the feature vector may consist of statistical,dynamic and spectral quantities for each of the aforementioned physicalparameters. Statistical features in the current implementation mayconsist of the maximum, minimum, mean, standard deviation, skewness andkurtosis for each of the parameters from speed to ppm. Futureimplementations may also include quartiles, autocorrelation or entropy.

In another aspect of the disclosure, dynamic features may also beutilized. A dynamic feature of a physical parameter time series h(t) maybe computed as follows:

$\frac{\sum\limits_{k = {- K}}^{K}\; {k \cdot {h\left( {t + k} \right)}}}{\sum\limits_{k = {- K}}^{K}k^{2}}$

In an embodiment, computation of either spectral or cepstralcoefficients may also be utilized to assist in determining driverauthentication. The proper application of these features may requirethat the sampling frequency be within a required range. In addition, aband pass filter may need to be applied to discard noise and retainfrequencies of interest. In addition, the selection of the frame widthand overlap needs to be carefully chosen and verified viacross-validation. Before a Discrete Fourier Transform (DFT) is appliedto the time domain signal h(t) to convert it to its frequency domainsignal H(w), a windowing process may need to be performed. Theapplication of a DFT assumes a periodic signal with a finite length andsuffers if the points of a time series on the opposite ends of a framedo not meet at the same value. This generates artificial discontinuitiesin the time series signal which in turn produce artificialhigh-frequency components in the frequency domain. Energy from thefrequencies of interest is moved into artificial pollutant frequenciesresulting in a spectral leakage.

In order to mitigate this effect the time series h[n] for n∈{0, . . . ,N−1} in the frame is multiplied point-wise by a Hamming window asfollows:

${\overset{\sim}{h}\lbrack n\rbrack} = {{{{h\lbrack n\rbrack} \cdot {w\lbrack n\rbrack}}\mspace{20mu} {where}\mspace{14mu} {w\lbrack n\rbrack}} = {0.54 - {0.46\mspace{11mu} {\cos\left( \frac{2\; \pi \; n}{N - 1} \right)}}}}$

The window function deemphasizes endpoints and brings their valuescloser together, thus diminishing the effect of discontinuities. Thesubsequent application of the DFT yields the frequency domain signal

${H\lbrack k\rbrack} = {{{DFT}\left\{ {\overset{\sim}{h}\lbrack n\rbrack} \right\}} = {{\sum\limits_{n = 0}^{N - 1}\; {{\overset{\sim}{h}\lbrack n\rbrack} \cdot e^{{- i}\frac{2\; \pi \; k\; n}{N}}}} = {\sum\limits_{n = 0}^{N - 1}{{h\lbrack n\rbrack} \cdot {w\lbrack n\rbrack} \cdot e^{{- i}\frac{2\; \pi \; k\; n}{N}}}}}}$with ${k = 0},\ldots \mspace{14mu},\frac{N}{2}$

Here,

$\omega = \frac{2\pi k}{N}$

is me angular frequency with

$k = \frac{N}{2}$

corresponding to the folding or Nyquist frequency 0.5 f_(s) with f_(s)=1Hz. As already mentioned, the fundamental frequency f_(s), also known asthe sampling rate may need to be considerably higher for this approachto reach its full potential.

It should be noted that the H[k] lives in the domain of complex numberswhich have both a magnitude and a phase. It is common practice toconsider only the magnitude of this signal for feature extraction, i.e.Ĥ[k]=|H[k]|. For each specific k, corresponding to a frequency ofinterest, Ĥ[k] represents one feature which is added to the existingfeature vector extracted from a frame. Cepstral features are obtained bytaking the log of the power spectrum and applying the Inverse DiscreteFourier Transform to the result. Only the real part the final quantityis considered.

This characterizes the spectral envelope. The application of thelogarithm to H[k] converts the convolution of signals in the timedomain, which becomes the product of these signals in the frequencydomain, into the addition of these signals in the cepstral domain.Different cepstral coefficients characterize the contribution ofdifferent component signals in the final output signal. Using C[n] asfeatures for different ranges of n can pick out individual componentsignals for machine learning purposes.

In an aspect of the disclosure, the Gaussian Mixture Model (GMM) in thecontext of a Universal Background Model (UBM) may be used in driverauthentication. Each driver identity may be represented by a GMMprobability density profile and each GMM may have a fixed number ofhidden states which are automatically generated during training. Thesensor source generating the time series data can be in a variety ofdifferent trip states. The trip states can be defined by the road type,traffic condition and driving maneuver. Each hidden state in the GMMshould ideally correspond to one of these trip states, but can alsorepresent trip states that may be harder to interpret. In order toprevent the differences between driver identity from being dominated bydifferences due to any other factors, comparisons between profiles aremade within each hidden state, while taking into account the probabilityof being in the respective hidden state. Incoming feature vectors fromindividual trip window frames would first get evaluated as to whichhidden state they correspond to. Then the profile differences within ahidden state may make the distinction between different drivers.

The general form of a Gaussian Mixture Model with K hidden states and nfeatures is

$\begin{matrix}{{{p\left( {{x\varphi},\mu,\sum} \right)} = {\sum\limits_{k = 1}^{K}\; {\varphi_{k} \cdot \frac{1}{\left( {2\; \pi} \right)^{\frac{n}{2}}{\sum_{k}}^{\frac{1}{2}}} \cdot {\exp\left( {- \frac{\begin{matrix}\left( {x - \mu_{k}} \right)^{T} \\{\overset{- 1}{\sum\limits_{k}}\left( {x - \mu_{k}} \right)}\end{matrix}}{2}} \right)}}}}{{wit}h}{{\sum_{k}\varphi_{k}} = 1.}} & (1)\end{matrix}$

In the GMM-UBM setting each driver receives a GMM profile based on atraining set of driver trip samples. The number of hidden states K isset via cross-validation. The estimation of the parameters λ=(ϕ, μ, Σ)is carried out with the Expectation-Maximization (EM) algorithm. Oncethese parameters are learned, incoming test samples x_(test) can beevaluated with the model. The quantity p(x_(test)|ϕ, μ, Σ) can beinterpreted as the probability (more precisely probability densityvalue) that x_(test) was generated from the distribution p.

The application of the GMM-UBM requires the normalization of featurevectors to zero mean and unit variance to avoid singular matrices Σ inequation (1). Each GMM p_(λ) thus comes with its own training set and acorresponding μ_(λ) and σ_(λ). These must be applied to any incomingtest sample before evaluation. In addition, the dimension of featurevectors can become so large that the desired frame-by-frame featureextraction and evaluation may become impractical. For this reason,Principal Component Analysis (PCA) will be applied to reducedimensionality while retaining 99% of the variance. Feature rankingalgorithms based on decrease in gini-index in decision trees and mutualinformation filters are also employed in an offline fashion topre-filter the feature list before feature extraction or PCA is applied.

In order to accelerate training and evaluation, the covariance matricesΣ are restricted to be diagonal. The restriction to diagonal covariancematrices is done for several reasons. First, the density modeling ofM-th order full covariance matrices can be achieved with higher orderdiagonal covariance matrices. Second, it simplifies the computation ofthe determinant and the inverse of Σ in (1).

A test trip

may be broken up into overlapping window frames. Each frame may yield afeature vector whose entries are statistical, dynamic and spectralquantities of physical control parameters. Thus

=(x₁, . . . , x_(T)) and the classification of the entire trip dependson the classification of its frames. The total probability value for atrip is taken to be the geometric mean of the probabilities for itsindividual frames.

${p\left( {\lambda} \right)} = \left( {\prod\limits_{t = 1}^{T}\; {p\left( {x_{t}\lambda} \right)}} \right)^{\frac{1}{T}}$

It should be pointed out that the model is agnostic to the order inwhich the frames appear. Shuffling the frames does not affect the finalresult. In a supervised learning setting one could train one GMM onpositive samples and a second GMM on negative ones, thus enabling acomparison between the two. This may be referred to as the GMM cohortmodel. In a semi-supervised setting the GMM-UBM should be used.

In an aspect of the disclosure, the Universal Background Model (UBM) mayprovide a contrasting profile to a given driver profile in order toenable a relative comparison for an incoming test sample. Evaluating afeature vector from an incoming test sample by computing its probabilitydensity value under the pertinent driver model provides an absolutenumber which by itself does not give us sufficient information toclassify the test sample. In an embodiment, the UBM may generate aprobability profile for an average background driver and enablecomputation of a probability density value p(

|λ_(UBM)) for a trip. In an embodiment, a driver and background profilemay be compared. If p(

|λ_(driver))>p(

|λ_(UBM)), the trip may be classified as a driver trip, otherwise it isclassified as a non-driver trip. Because the probability density valuesare very small, the log values of the probabilities may need to becompared.

In an embodiment, the UBM may enable determination of driverauthentication in a semi-supervised setting, i.e. the GMM-UBM sees onlypositive labels and is agnostic to negative ones. In an aspect of thedisclosure, the UBM may be built using a fixed UBM, a dynamic UBM, andan adaptive UBM.

In an aspect of the disclosure, a fixed UBM may involve training asingle GMM on a large set of data comprising trips from many differentdrivers. This fixed UBM may be evaluated quickly, but may suffer fromits static composition. In an embodiment the composition of the UBM maybe changed and adapted to individual drivers. In an embodiment, toachieve this fit, the fixed UBM may have to be retrained.

In another embodiment, a dynamic UBM may enable determination of driverauthentication in a semi-supervised setting. The dynamic UBM may becreated by taking the GMM basis model p(x|λ_(d) _(i) ) for each driverd_(i) and expressing the final UBM as a linear combination of thesefunctions.

${p\left( {x\lambda_{\bigcup{BM}}} \right)} = {{\sum\limits_{i = 1}^{B}\; {{\alpha_{i} \cdot {p\left( {x\lambda_{d_{i}}} \right)}}\mspace{14mu} {where}\mspace{14mu} {\sum\limits_{i = 1}^{B}\; \alpha_{i}}}} = 1}$

In an embodiment, an advantage of the dynamic UBM is that thecoefficients α_(i) can be regarded as additional parameters that can befitted to known negative samples in the training set for a driver. Giventhis additional knowledge about a certain driver, the coefficients tothe driver in question may be adjusted in order to better representthese negative samples and thus achieve higher performance. In anembodiment, the dynamic UBM may be more malleable than the fixed UBM andmay be easily adjusted by shuffling its basis functions. For example,the GMM for the driver in question should not be used in the contrastingUBM but is different from driver to driver. The dynamic setting mayallow for selectively excluding this GMM as a basis function by settingits coefficient to zero. The default setting is

${a_{i} = \frac{1}{B}},$

where B is the total number of basis models excluding the driver inquestion. If data with negative labels is available, the performance ofthese basis functions can be ranked according to how well they representthe negative samples. In an embodiment, the top ranked models may betaken as the basis functions and set the coefficients of all othermodels to zero.

In another aspect of the disclosure, an adaptive UBM may enabledetermination of driver authentication in a semi-supervised setting. Inan embodiment, the adaptive UBM may be a fixed UBM trained on allavailable data. In this setting the UBM is not built from driver GMM'sbut vice versa. In an embodiment, the fixed UBM built on the entire dataset may act as a prior for the creation of a driver GMM. To derive thedriver GMM from the fixed UBM, positive training samples for the driverin question may be used to update the well-trained parameters of thefixed UBM via adaptation.

In an embodiment, suppose that positively labelled driver data isavailable yielding a set of feature vectors χ={x₁, . . . , x_(T)}.Suppose further that λ_(UBM)=(ϕ_(k), μ_(k), Σ_(k)) with k=1 . . . K arelearned. Then ∀k∈{1, . . . , K}, λ_(driver)=({circumflex over (ϕ)}_(k),{circumflex over (μ)}_(k), {circumflex over (Σ)}_(k)) can be computed asfollows:

$\begin{matrix}{{\hat{\varphi}}_{k} = {\gamma \left( {\frac{\alpha_{k}^{\varphi}n_{k}}{T} + {\left( {1 - \alpha_{k}^{\varphi}} \right)\varphi_{k}}} \right)}} & (2) \\{{\hat{\mu}}_{k} = {{\alpha_{k}^{\mu}{E_{k}(x)}} + {\left( {1 - \alpha_{k}^{\mu}} \right)\mu_{k}}}} & (3) \\{{\hat{\sum}}_{k}{= {{\alpha_{k}^{\sigma}{E_{k}\left( x^{2} \right)}} + {\left( {1 - \alpha_{k}^{\sigma}} \right)\left( {\sum_{k}{{+ \mu_{k}}\mu_{k}^{\prime}}} \right)} - {{\hat{\mu}}_{k}{\hat{\mu}}_{k}^{\prime}\mspace{14mu} {where}}}}} & (4) \\{{n_{k} = {\sum\limits_{t = 1}^{T}\; {\Pr \left( {kx_{t}} \right)}}},{{E_{k}(x)} = {\frac{1}{n_{k}}{\sum\limits_{t = 1}^{T}{x_{t}\Pr \left( {kx_{t}} \right)}}}},{{E_{k}\left( x^{2} \right)} = {\frac{1}{n_{k}}{\sum\limits_{t = 1}^{T}{x_{t}^{2}{\Pr \left( {kx_{t}} \right)}\mspace{14mu} {and}}}}}} & (5) \\{{\Pr \left( {kx_{t}} \right)} = \frac{w_{k}{p_{k}\left( x_{t} \right)}}{\sum\limits_{i = 1}^{K}\; {w_{i}{p_{i}\left( x_{t} \right)}}}} & (6)\end{matrix}$

The adaptation coefficients {a_(k) ^(ϕ), a_(k) ^(u), a_(k) ^(σ)} anregulate the contribution between the old and new mixture parameters.All parameters will be constrained to depend on a single parameter p,called the relevance factor, in the following fashion

$\begin{matrix}{\alpha_{k}^{\varphi} = {\alpha_{k}^{\mu} = {\alpha_{k}^{\sigma} = \frac{n_{k}}{n_{k} + \rho}}}} & (7)\end{matrix}$

In an embodiment, the relevance factor may be set to ρ=16. The aboverecipe for updating the UBM coefficients may be obtained as a result ofsolving a maximum a posteriori (MAP) estimation problem for a GMM withp(x|λ_(UBM)) acting as the prior.

In an embodiment, use of an adaptive UBM may have advantages thatinclude retaining a semi-supervised learning setting. In addition, theadaptive UBM may allow for a faster scoring technique when compared toother UBM designs.

In another aspect of the disclosure, routine trips which are wellrepresented in the training set may be processed with the routinefeature model. Unusual trips such as road trips or business trips may bepassed to the control feature model. In an embodiment, an anomalydetection framework may be generated and used to determine whether atrip is well represented in the training set. Such a framework maydetermine whether to use the routine feature model or the controlfeature model.

In an embodiment, an anomaly detection framework may include usingpoints of interest clustering. Incoming trips may be classified asanomalies if there is a very high likelihood that the endpoints of theincoming trip do not belong to any of the known Point Of Interest (POI)clusters. If a trip is an anomaly, preference may be given to thecontrol-based GMM model. A two-dimensional Gaussian probabilitydistribution may be fitted to each POI cluster. In an embodiment, atwo-dimensional distribution with a diagonal covariance matrix withidentical entries may be utilized.

A maximum log-likelihood estimation can be performed to derive theoptimal fit. Given a POI cluster c={x^((i))}_(i=1) ^(m)={(x_(lat)^((i)),x_(long) ^((i)))}_(i=1) ^(m) of latitude and longitudecoordinates, the MLE gives us

${l\left( {\mu,\sum} \right)} = {{l\left( {\mu,\lambda} \right)} = {{\log {\prod\limits_{i = 1}^{m}\; {p\left( {{x^{(i)};\mu},\lambda} \right)}}} = {\sum\limits_{i = 1}^{m}{\log \; {p\left( {{x^{(i)};\mu},\lambda} \right)}}}}}$where${p\left( {{x^{(i)};\mu},\lambda} \right)} = {\frac{1}{2\; \pi}\frac{1}{\lambda}{\exp\left( {- \frac{\left( {x_{lat}^{(i)} - \mu_{lat}} \right)^{2} + \left( {x_{long}^{(i)} - \mu_{long}} \right)^{2}}{2\; \lambda}} \right)}\mspace{14mu} {and}}$$\sum{= \begin{pmatrix}\lambda & 0 \\0 & \lambda\end{pmatrix}}$

Setting the derivatives w.r.t.μ and λ to zero and solving for μ and λyields

$\mu = {{\frac{1}{m}{\sum\limits_{i = 1}^{m}\; {x^{(i)}\mspace{14mu} \lambda}}} = {\frac{1}{m}{\sum\limits_{i = 1}^{m}\frac{{{x^{(i)} - \mu}}^{2}}{2}}}}$

This allows the computation of a radius around the cluster mean. Samplesfalling within this radius have a certain probability of belonging tothe respective cluster. Using this distribution, a simple integrationshows that a sample belonging to a cluster with probability p must bewithin a radius

$R = {\left( {2\lambda \; \log \; \left( \frac{1}{1 - p} \right)} \right)^{\frac{1}{2}}.}$

For p=0.99, R=(2λ log 100)^(1/2). If ∥x−μ∥²>R, a trip is an anomaly.

In another embodiment, an anomaly detection framework may include usingconfidence scores. Tree-based methods may assign probability scores forpositive and negative samples. GMM methods assign probability densityscores. These scores can be used to compute confidence values.

In yet another aspect of the disclosure, incoming test samples may firstbe passed through a POI or other anomaly detection module and redirectedto the GMM-UBM only in case of an anomaly. In the absence of an anomalythe test sample may be directed to the GBM. If the confidence score forthe GBM verdict is above a certain threshold, the GBM result is thefinal output. Otherwise, the GMM result is computed and returned. It maybe advantageous to develop a framework where the confidence scores ofGBM and GMM can be properly compared so that one can compare theconfidence values and rank the results accordingly.

In another aspect of the disclosure, other sensors such as anaccelerometer and/or gyroscope may be used in place of the GPS signal.The accelerometer and/or gyroscope sensor may be included in a mobiledevice. The accelerometer and/or gyroscope sensors may use a samplingfrequency in the range of 10-100 Hz. This granularity may allow for thecapture of additional information which may be absent in 1 Hz GPSsignals. In an embodiment, use of the accelerometer and/or gyroscopesensors may result in additional filtering to reduce signal noise. Inaddition, mobile device phone and car coordinate systems should ideallybe completely aligned. Full alignment may be possible if gyroscope andgravity sensors are utilized. Partial alignment of z-axis may beachieved with gravity sensors only and may result in a sufficientlyuseful signal source for the purposes of driver detection. Finally,shock absorption in cushioned enclosures e.g. pocket or handbag, maydampen the signal and thus introduce additional distortions.Compensating for these distortions is extremely challenging, althoughnot impossible in principle.

FIG. 5 illustrates an example process of determining a driver based onGPS and movement data according to one or more aspects of thedisclosure. In FIG. 5, at step 502 data such as GPS data and movementdata may be received. In step 504, unlabeled trip data may be extractedfrom the received GPS data and movement data. In step 514, labeled tripdata may be extracted. In an aspect of the disclosure, a user driverprofile may be determined based on the labeled trip data in step 516. Inanother aspect of the disclosure, a background driver profile may bedetermined based on a least in part the extracted labeled trip data instep 518.

In an aspect of the disclosure, a driving pattern for each trip may bedetermined in step 506. In an embodiment, driving data analysis server210 (e.g., an insurance provider server 210 or other organizationalserver 210) may store the determined driving pattern and/or additionaldriving data. In some cases, a server 210 may receive determined drivingpatterns and driving data from a plurality of different mobile devices220.

In step 508, a driving profile may be determined based on the determineddriving pattern. In step 510, the driving profile may be compared toother driving profiles and to a generated background driver profile. Inan embodiment, the background driver profile may be generated based onall known driving profiles. The background driver profile may representan average of all known driving profiles. In step 512, the identity ofthe driver may be authenticated.

The driving data collected, when associated with the correspondingdriver and/or vehicle data may have many different applications and maybe provided to different entities. For example, the data may be used forvehicle or driver insurance or financing (e.g., driving data indicatingsafe or unsafe driving), law enforcement (e.g., driving data indicatingmoving violations), and product retail or marketing entities (e.g.,driving data indicating a driver's driving behaviors and habits, such asradio stations and ads listened to, routes driven and stops made, etc.).

While the aspects described herein have been discussed with respect tospecific examples including various modes of carrying out aspects of thedisclosure, those skilled in the art will appreciate that there arenumerous variations and permutations of the above described systems andtechniques that fall within the spirit and scope of the invention.

What is claimed is:
 1. A method comprising: receiving, by a computingdevice, vehicle sensor data collected by one or more sensors during adriving trip; identifying, by the computing device and in a training setof driver trip data, known points of interest (POI) clusters;determining, by the computing device and based on comparing the vehiclesensor data to the training set, a probability that endpoints of thevehicle sensor data belong to the known POI clusters; determining, bythe computing device and based on the probability that the endpoints ofthe vehicle sensor data belong to the known POI clusters, whether thevehicle sensor data represents a routine driving trip; selecting, by thecomputing device and based on whether the vehicle sensor data representsthe routine driving trip, one of a plurality of machine learning modelsto use to identify a driver associated with the driving trip;determining, by the computing device, from the vehicle sensor data andusing the selected machine learning model, the driver associated withthe driving trip; and displaying, by the computing device, informationcorresponding to the determined driver.
 2. The method of claim 1,further comprising: determining, based on the probability satisfying athreshold value, that the vehicle sensor data represents the routinedriving trip, wherein the selected machine learning model comprises afirst machine learning model, of the plurality of machine learningmodels, and wherein the first machine learning model uses routine-basedfeatures extracted from the vehicle sensor data to determine the driverassociated with the driving trip.
 3. The method of claim 1, furthercomprising: causing, based on determining that the vehicle sensor datarepresents the routine driving trip, the selected machine learning modelto perform: analyzing the vehicle sensor data to determine a drivingpattern associated with the driving trip; determining, based on thedriving pattern, a driving profile for the driving trip; anddetermining, based on comparing the driving profile to previouslydetermined driver profiles and a background driver profile, the driverassociated with the driving trip.
 4. The method of claim 3, whereinanalyzing the vehicle sensor data to determine the driving patterncomprises: analyzing the vehicle sensor data to determine one or moreof: a stopping point corresponding to a location at which the vehiclestopped during the driving trip, or a total number of turns during thedriving trip; and determining the driving pattern associated with thedriving trip, based at least in part on the stopping point or the totalnumber of turns.
 5. The method of claim 1, further comprising:determining, based on the probability not satisfying a threshold value,that the vehicle sensor data represents an anomalous driving trip,wherein the selected machine learning model comprises a second machinelearning model, of the plurality of machine learning models, and whereinthe second machine learning model uses control-based features extractedfrom the vehicle sensor data to determine the driver associated with thedriving trip.
 6. The method of claim 1, further comprising: causing,based on determining that the vehicle sensor data represents ananomalous driving trip, the selected machine learning model to perform:analyzing the vehicle sensor data to determine a plurality oftime-series data; dividing each of the plurality of time-series datainto a plurality of overlapping window frames of a predetermined length;for each of the plurality of overlapping window frames, analyzingcorresponding data for the window frame to determine a first probabilityscore reflecting a probability that the window frame comprises drivertrip data and a second probability score reflecting a probability thatthe window frame comprises non-driver trip data; calculating, using thefirst probability scores for each of the plurality of overlapping windowframes, a first total probability score reflecting a probability thatthe driving trip comprises driver-trip data; calculating, using thesecond probability scores for each of the plurality of overlappingwindow frames, a second total probability score reflecting a probabilitythat the driving trip comprises non-driver trip data; and determining,based on a comparison of the first total probability score and thesecond total probability score, the driver associated with the drivingtrip.
 7. The method of claim 6, wherein the plurality of time-seriesdata comprises control-based features corresponding to at least one of:a steering wheel angle, a gas pedal position, a brake pedal position, orone or more physical parameters, and wherein the one or more physicalparameters comprise at least one of: speed, course, acceleration, jerk,angular speed, angular acceleration, angular jerk, or power per massratio.
 8. The method of claim 6, wherein analyzing the correspondingdata for the window frame to determine the first probability score andthe second probability score comprises: generating, using control-basedfeatures extracted from the corresponding data for the window frame, afeature vector for the window frame; and determining a trip stateassociated with the feature vector for the window frame, and wherein themethod further comprises: retrieving a driver probability profileassociated with a user of the computing device; generating, based oncomparing the feature vector for the window frame to a portion of thedriver probability profile determined to correspond to the determinedtrip state associated with the feature vector for the window frame, thefirst probability score; retrieving an average driver probabilityprofile associated with an average driver; and generating, based oncomparing the feature vector for the window frame to a portion of theaverage driver probability profile determined to correspond to thedetermined trip state associated with the feature vector for the windowframe, the second probability score.
 9. A computing device comprising: aprocessor; and memory storing computer-executable instructions that,when executed by the processor, cause the computing device to: receivevehicle sensor data collected by one or more sensors during a drivingtrip; identify, in a training set of driver trip data, known points ofinterest (POI) clusters; determine, based on comparing the vehiclesensor data to the training set, a probability that endpoints of thevehicle sensor data belong to the known POI clusters; determine, basedon the probability that the endpoints of the vehicle sensor data belongto the known POI clusters, whether the vehicle sensor data represents aroutine driving trip; select, based on whether the vehicle sensor datarepresents the routine driving trip, one of a plurality of machinelearning models to use to identify a driver associated with the drivingtrip; determine, from the vehicle sensor data and using the selectedmachine learning model, the driver associated with the driving trip; anddisplay information corresponding to the determined driver.
 10. Thecomputing device of claim 9, wherein the instructions, when executed bythe processor, further cause the computing device to: determine, basedon the probability satisfying a threshold value, that the vehicle sensordata represents the routine driving trip, wherein the selected machinelearning model comprises a first machine learning model, of theplurality of machine learning models, and wherein the first machinelearning model uses routine-based features extracted from the vehiclesensor data to determine the driver associated with the driving trip.11. The computing device of claim 9, wherein the instructions, whenexecuted by the processor, further cause the computing device to: cause,based on determining that the vehicle sensor data represents the routinedriving trip, the selected machine learning model to: analyze thevehicle sensor data to determine one or more of: a stopping pointcorresponding to a location at which the vehicle stopped during thedriving trip, or a total number of turns during the driving trip; anddetermine, based at least in part on the stopping point or the totalnumber of turns, a driving pattern associated with the driving trip;determine, based on the driving pattern, a driving profile for thedriving trip; and determine, based on comparing the driving profile topreviously determined driver profiles and a background driver profile,the driver associated with the driving trip.
 12. The computing device ofclaim 9, wherein the instructions, when executed by the processor,further cause the computing device to: determine, based on theprobability not satisfying a threshold value, that the vehicle sensordata represents an anomalous driving trip, wherein the selected machinelearning model comprises a second machine learning model, of theplurality of machine learning models, and wherein the second machinelearning model uses control-based features extracted from the vehiclesensor data to determine the driver associated with the driving trip.13. The computing device of claim 9, wherein the instructions, whenexecuted by the processor, further cause the computing device to: cause,based on determining that the vehicle sensor data represents ananomalous driving trip, the selected machine learning model to: analyzethe vehicle sensor data to determine a plurality of time-series data;device each of the plurality of time-series data into a plurality ofoverlapping window frames of a predetermined length; for each of theplurality of overlapping window frames, analyze corresponding data forthe window frame to determine a first probability score reflecting aprobability that the window frame comprises driver trip data and asecond probability score reflecting a probability that the window framecomprises non-driver trip data; calculate, using the first probabilityscores for each of the plurality of overlapping window frames, a firsttotal probability score reflecting a probability that the driving tripcomprises driver-trip data; calculate, using the second probabilityscores for each of the plurality of overlapping window frames, a secondtotal probability score reflecting a probability that the driving tripcomprises non-driver trip data; and determine, based on a comparison ofthe first total probability score and the second total probabilityscore, the driver associated with the driving trip.
 14. The computingdevice of claim 13, wherein the instructions, when executed by theprocessor, cause the computing device to analyze the corresponding datafor the window frame to determine the first probability score and thesecond probability score by: generating, using control-based featuresextracted from the corresponding data for the window frame, a featurevector for the window frame; and determining a trip state associatedwith the feature vector for the window frame, and wherein theinstructions, when executed by the processor, further cause thecomputing device to: retrieve a driver probability profile associatedwith a user of the computing device; generate, based on comparing thefeature vector for the window frame to a portion of the driverprobability profile determined to correspond to the determined tripstate associated with the feature vector for the window frame, the firstprobability score; retrieve an average driver probability profileassociated with an average driver; and generate, based on comparing thefeature vector for the window frame to a portion of the average driverprobability profile determined to correspond to the determined tripstate associated with the feature vector for the window frame, thesecond probability score.
 15. A non-transitory, computer-readablestorage medium storing instructions that, when executed by a processorof a computing device, cause the computing device to: receive vehiclesensor data collected by one or more sensors during a driving trip;identify, in a training set of driver trip data, known points ofinterest (POI) clusters; determine, based on comparing the vehiclesensor data to the training set, a probability that endpoints of thevehicle sensor data belong to the known POI clusters; determine, basedon the probability that the endpoints of the vehicle sensor data belongto the known POI clusters, whether the vehicle sensor data represents aroutine driving trip; select, based on whether the vehicle sensor datarepresents the routine driving trip, one of a plurality of machinelearning models to use to identify a driver associated with the drivingtrip; determine, from the vehicle sensor data and using the selectedmachine learning model, the driver associated with the driving trip; anddisplay information corresponding to the determined driver.
 16. Thenon-transitory, computer-readable storage medium of claim 15, whereinthe instructions, when executed by the processor, further cause thecomputing device to: determine, based on the probability satisfying athreshold value, that the vehicle sensor data represents the routinedriving trip, wherein the selected machine learning model comprises afirst machine learning model, of the plurality of machine learningmodels, and wherein the first machine learning model uses routine-basedfeatures extracted from the vehicle sensor data to determine the driverassociated with the driving trip.
 17. The non-transitory,computer-readable storage medium of claim 15, wherein the instructions,when executed by the processor, further cause the computing device to:cause, based on determining that the vehicle sensor data represents theroutine driving trip, the selected machine learning model to: analyzethe vehicle sensor data to determine one or more of: a stopping pointcorresponding to a location at which the vehicle stopped during thedriving trip, or a total number of turns during the driving trip; anddetermine, based at least in part on the stopping point or the totalnumber of turns, a driving pattern associated with the driving trip;determine, based on the driving pattern, a driving profile for thedriving trip; and determine, based on comparing the driving profile topreviously determined driver profiles and a background driver profile,the driver associated with the driving trip.
 18. The non-transitory,computer-readable storage medium of claim 15, wherein the instructions,when executed by the processor, further cause the computing device to:determine, based on the probability not satisfying a threshold value,that the vehicle sensor data represents an anomalous driving trip,wherein the selected machine learning model comprises a second machinelearning model, of the plurality of machine learning models, and whereinthe second machine learning model uses control-based features extractedfrom the vehicle sensor data to determine the driver associated with thedriving trip.
 19. The non-transitory, computer-readable storage mediumof claim 15, wherein the instructions, when executed by the processor,further cause the computing device to: cause, based on determining thatthe vehicle sensor data represents an anomalous driving trip, theselected machine learning model to: analyze the vehicle sensor data todetermine a plurality of time-series data; device each of the pluralityof time-series data into a plurality of overlapping window frames of apredetermined length; for each of the plurality of overlapping windowframes, analyze corresponding data for the window frame to determine afirst probability score reflecting a probability that the window framecomprises driver trip data and a second probability score reflecting aprobability that the window frame comprises non-driver trip data;calculate, using the first probability scores for each of the pluralityof overlapping window frames, a first total probability score reflectinga probability that the driving trip comprises driver-trip data;calculate, using the second probability scores for each of the pluralityof overlapping window frames, a second total probability scorereflecting a probability that the driving trip comprises non-driver tripdata; and determine, based on a comparison of the first totalprobability score and the second total probability score, the driverassociated with the driving trip.
 20. The non-transitory,computer-readable storage medium of claim 15, wherein the instructions,when executed by the processor, cause the computing device to analyzethe corresponding data for the window frame to determine the firstprobability score and the second probability score by: generating, usingcontrol-based features extracted from the corresponding data for thewindow frame, a feature vector for the window frame; and determining atrip state associated with the feature vector for the window frame, andwherein the instructions, when executed by the processor, further causethe computing device to: retrieve a driver probability profileassociated with a user of the computing device; generate, based oncomparing the feature vector for the window frame to a portion of thedriver probability profile determined to correspond to the determinedtrip state associated with the feature vector for the window frame, thefirst probability score; retrieve an average driver probability profileassociated with an average driver; and generate, based on comparing thefeature vector for the window frame to a portion of the average driverprobability profile determined to correspond to the determined tripstate associated with the feature vector for the window frame, thesecond probability score.